A Comparison of Unsupervised Bilingual Term Extraction Methods Using Phrase-Tables

نویسندگان

  • Masamichi Ideue
  • Kazuhide Yamamoto
  • Masao Utiyama
  • Eiichiro Sumita
چکیده

Automatic bilingual term extraction is essential for providing a consistent bilingual term list for human translators engaged in translating a set of documents. We compare three statisticalmeasures for extracting bilingual terms from a phrase-table built from a parallel corpus. We show that these measures extract different bilingual term candidates and a combination of these measures ranks valid bilingual terms highly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using word2vec for Bilateral Translation

Word and phrase tables are key inputs to machine translations, but costly to produce. New unsupervised learning methods represent words and phrases in a high-dimensional vector space, and these monolingual embeddings have been shown to encode syntactic and semantic relationships between language elements. The information captured by these embeddings can be exploited for bilingual translation by...

متن کامل

Bilingual Termbank Creation via Log-Likelihood Comparison and Phrase-Based Statistical Machine Translation

Bilingual termbanks are important for many natural language processing (NLP) applications, especially in translation workflows in industrial settings. In this paper, we apply a log-likelihood comparison method to extract monolingual terminology from the source and target sides of a parallel corpus. Then, using a Phrase-Based Statistical Machine Translation model, we create a bilingual terminolo...

متن کامل

Bilingual Data Cleaning for SMT using Graph-based Random Walk

The quality of bilingual data is a key factor in Statistical Machine Translation (SMT). Low-quality bilingual data tends to produce incorrect translation knowledge and also degrades translation modeling performance. Previous work often used supervised learning methods to filter lowquality data, but a fair amount of human labeled examples are needed which are not easy to obtain. To reduce the re...

متن کامل

Learning Better Rule Extraction with Translation Span Alignment

This paper presents an unsupervised approach to learning translation span alignments from parallel data that improves syntactic rule extraction by deleting spurious word alignment links and adding new valuable links based on bilingual translation span correspondences. Experiments on Chinese-English translation demonstrate improvements over standard methods for tree-to-string and tree-to-tree tr...

متن کامل

Building a Bilingual Lexicon Using Phrase-based Statistical Machine Translation via a Pivot Language

This paper proposes a novel method for building a bilingual lexicon through a pivot language by using phrase-based statistical machine translation (SMT). Given two bilingual lexicons between language pairs Lf–Lp and Lp–Le, we assume these lexicons as parallel corpora. Then, we merge the extracted two phrase tables into one phrase table between Lf and Le. Finally, we construct a phrase-based SMT...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011